The best argument against Anova is to show how the analysis will look like if we used parameter estimation instead. With complex experimental design this means that we will use regression. Most psychologists think of linear regression. However, the approach extends to general linear models of the sort $y=f(b_0+b_1\cdot x_1 + b_2\cdot x_2 \dots)$. Linear regression is just a special case when $f(\cdot)$ is the identity function $y=x$.

When the outcome variable takes binary values $0,1$ then it is better to use logistic regression than the linear regression. Here the link function is $f(x)= 1/(1+e^{-x})$ is the logistic function. As with any kind of data the default approach in psychology research is to analyze binary outcomes with Anova. Let's see what are the consequences.

In our first demonstration we will take a look at a study by Mussel and colleagues published in JDM.

The study investigated whether a smiling/angry/neutral face influences collaboration in an ultimatum game (i.e. prisoner's dilemma with no iteration). The subjects were shown facial expression of the opponent avatar and the amount of money he offered. Then they were asked whether they wish to collaborate. The oppenet could propose the share a fraction from the total of 14 cents. The offers ranged from 7c (overly fair) to 1c (unfair) in 1c steps and subjects decided whether they accept the offer or not. The authors also varied whether the face was male or female. The combination of these factors gives us 2x3x7=42 stimuli. Each of 1326 subjects was shown all 42 stimuli in random order and either collaborated (1) or not (0). Trial ended if subjects failed to respond within a 3 second time limit. In the data set these trials were noted as missing values. Next, authors replaced missing values with subject-wise averages. They then analysed the data with a repeated measures Anova with two factors: fairness (7 levels) and face expression (3 levels). As the high N suggests both main-effects and their interaction were significant. Also all post-hoc comparisons were significant (p<.05, bonf. corrected). Smiling conditions had higher collaboration rate than neutral and neutral was in turn higher than angry. When subjects were offered more money they collaborated more. So what can we conclude? Not much. Smiling face increases the offer acceptance. More generous offer does also increase acceptance. No clue where the interaction comes from.

With N=1326 the $p$ values are almost entirely driven by the large sample size. But of course the magnitude of the effect matters. If the collaboration rate is 70 % for smiling faces and 40% for neutral than this is notable. But if the rates are 51 and 50% respectively we don't really care. If the collaboration rates were 95% and 90% we would consider the study inconclusive due to ceiling effects. $p$ values are inherently incapable of providing this information. We need effect size estimates.

We are fortunate. Mussel et al. are good academic citizens and report the standardized effect size.

$$ \eta_{\mathrm{face}}=.1 $$$$ \eta_{\mathrm{money}}=.39 $$$$ \eta_{\mathrm{money}\times \mathrm{face}}=.01 $$

Are we smarter now? How should we interpret these effect sizes? Is $ \eta_{\mathrm{face}}=.1 $ more like the 70%-40% difference or more like the 51%-50% difference? Presumably, the last question is not what we are supposed to ask. Rather, we need to survey the decision making literature to see what are the usual effect sizes and take these as a benchmark for comparison.. Taking the standards from Fritz et al. (2011) we would say that the facial expression has small effect while money has medium effect on acceptance. The interaction shows minuscule effect size and should be presumably discared.

One difficulty with this interpretation is that the sums of squares on which $\eta$ is based depend on the variance of the predictors. The effect size could be improved by using fewer money offer levels. Similar, the addition of face gender factor influences each of the effect size estimates above, because it presumably increases the error variance. As such effect sizes can't be compared across studies, not even across replications if these aren't exact.

The only reason why most of the papers with of the Anova analyses are worth reading after all, is that they include a figure with plotted group averages. Mussel et al. show mercy and give us the following graph.

Ok. Now we see it. The interaction is due to ceiling/floor effects for large and small sums respectively. We are really interested in what happens in the middle. Here, the extra smile increases (compared to neutral face) the acceptance by 5-10%. Angry face shows an effect of similar magnitude but goes in the opposite direction.

Why is it not possible to do this kind of analysis formally? It is. But we need to something else than Anova.

We formulate a regression model with acceptance ($\mathrm{coop}$) as outcome and face expression ($\mathrm{face}$) and offered money sum ($\mathrm{fair}$) as predictors.

$$\mathrm{coop}_{i,j} \sim \mathrm{Bern}(\pi_{i,j})$$$$\pi_{i,j} = \mathrm{logit}^{-1}(\alpha_{\mathrm{face}[i,j]}+\beta_{\mathrm{face}[i,j]}\mathrm{fair}[i,j])(1-\gamma_{\mathrm{face}[i,j]}-\delta_{\mathrm{face}[i,j]})+\gamma_{\mathrm{face}[i,j]}$$

We enter money sum as continuous predictor of acceptance at each trial $j$ for each subject $i$. We fit a separate model for each type of face expression and for each subject. $\mathrm{face}[i,j]$ is the index of the face expression which subject $i$ saw at trial $j$. The parameters $\gamma$ and $\delta$ determine the level (of acceptance) where the logit function becomes flat. Without this addition the logit curve would become flat at $[0,1]$. We can see from the figure from the paper that this is not the case.

Let's look at how the logistic function works by playing around with its parameters.


In [1]:
from scipy.stats import scoreatpercentile as sap
clr=(0.2, 0.5, 0.6)
%pylab inline
plt.figure(figsize=(6,4))
plt.subplot(2,2,1);plt.title('alpha')
x=np.arange(1,9,0.1)
for a in np.arange(2,9).tolist():
    y=1/(1+exp(a-1*x))*1+0
    plt.plot(x,y,'b')
plt.grid(b=False,axis='x');plt.ylim([0,1]);plt.xlim([1,8])
plt.subplot(2,2,2);plt.title('beta')
for b in np.arange(-1,3,0.5).tolist():
    y=1/(1+exp(4.5*2**b-2**b*x))*1+0
    plt.plot(x,y,'b')
plt.grid(b=False,axis='x');plt.ylim([0,1]);plt.xlim([1,8])
plt.subplot(2,2,3);plt.title('gamma')
for c in np.arange(0,0.8,0.1).tolist():
    y=1/(1+exp(4.5-x))*(1-c)+c
    plt.plot(x,y,'b'); 
plt.grid(b=False,axis='x');plt.ylim([0,1]);plt.xlim([1,8])
plt.subplot(2,2,4);plt.title('delta')
for d in np.arange(0,0.8,0.1).tolist():
    y=1/(1+exp(4.5-x))*(1-0.1-d)+0.1
    plt.plot(x,y,'b'); 
plt.grid(b=False,axis='x');plt.ylim([0,1]);plt.xlim([1,8]);


Populating the interactive namespace from numpy and matplotlib
/usr/lib/pymodules/python2.7/matplotlib/figure.py:1533: UserWarning: This figure includes Axes that are not compatible with tight_layout, so its results might be incorrect.
  warnings.warn("This figure includes Axes that are not "

We see that $\alpha$ shifts the function on the x axis. $\beta$ alters the steepness of the function. As already mention $\gamma$ and $\delta$ determine the floor and the ceiling of the function.

Next let's perform the analysis. We first load the data.


In [2]:
from urllib import urlopen
f=urlopen('http://journal.sjdm.org/12/12817/data.csv')
D=np.loadtxt(f,skiprows=3,delimiter=',')[:,7:]
f.close()
D.shape


Out[2]:
(1326, 42)

In [3]:
# predictors
vfair=np.array(([range(1,8)]*6)).flatten() # in cents
vmale=np.ones(42,dtype=int); vmale[21:]=0
vface=np.int32(np.concatenate([np.zeros(7),np.ones(7),np.ones(7)*2]*2))
# anova format
sid=[];face=[];fair=[]
for i in range(D.shape[0]):
    for j in range(D.shape[1]):
        sid.append(i)
        #face.append(['angry','neutral','smile'][vface[j]])
        face.append(vface[j])
        fair.append(vfair[j])
coop=D.flatten()
sid=np.array(sid)
face=np.array(face)
fair=np.array(fair)
assert np.all(coop[:42]==D[0,:])
print coop.size,len(sid),len(face),len(fair)
print D.shape


55692 55692 55692 55692
(1326, 42)

It is good to do some manual fitting to see just how the logistic curve behaves but also to assure ourselves that the we can get a shape similar the pattern of our data.


In [4]:
D[D==2]=np.nan
R=np.zeros((3,7))
for i in np.unique(vface).tolist():
    for j in np.unique(vfair).tolist():
        sel=np.logical_and(i==vface,j==vfair)
        R[i,j-1]=np.nansum(D[:,sel])/(~np.isnan(D[:,sel])).sum()
for i in range(3):
    y=1/(1+exp(4.5-1.2*np.arange(1,8)))*0.5+[0.4,0.44,0.47][i]
    plt.plot(range(1,8),y,':',color=['b','r','g'][i])
    plt.plot(range(1,8),R[i,:],color=['b','r','g'][i])
plt.legend(['Data Angry','Model Angry','Data Neutral',
    'Model Neutral','Data Smile','Model Smile'],loc=4);


Above I fitted the logistic curve to the data. I got the parameter values by several iterations of trial-and-error. We want to obtain precise estimates and also we wish to get an idea about the uncertainty of the estimate. We implement the model in STAN.


In [5]:
import pystan

model = """
data {
    int<lower=0> N;
    int<lower=0,upper=1> coop[N]; // acceptance
    int<lower=0,upper=8> fair[N]; // fairness
    int<lower=1,upper=3> face[N]; // face expression
}
parameters {
    real<lower=-20,upper=20> alpha[3];
    real<lower=0,upper=10> beta[3];
    simplex[3] gamm[3];

}
transformed parameters{
    vector[N] x;
    for (i in 1:N)
        x[i]<-inv_logit(alpha[face[i]]+beta[face[i]]*fair[i])
            *gamm[face[i]][3]+gamm[face[i]][1];     
}
model {
    coop ~ bernoulli(x);
}
"""
#inpar=[{'alpha':[-4.5,-4.5,-4.5],'beta':[1.2,1.2,1.2],
#        'gamma':[0.4,0.44,0.47],'delta':[0.1,0.06,0.03]}]*4
sm = pystan.StanModel(model_code=model)


INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_b138e62c537965b343fa6a7f909fc2d7 NOW.

Run it!


In [6]:
dat = {'N': coop.size,'coop':np.int32(coop),'fair':fair,'face':face+1,'sid':sid}
seed=np.random.randint(2**16)
fit=sm.sampling(data=dat,iter=6000,chains=4,thin=5,warmup=2000,n_jobs=4,seed=seed)
print pystan.misc._print_stanfit(fit,pars=['alpha','beta','gamm'],digits_summary=2)
w= fit.extract()
np.save('alpha',w['alpha'])
np.save('gamm',w['gamm'])
np.save('beta',w['beta'])
del w
del fit


INFO:pystan:NOW ON CHAIN 0
INFO:pystan:NOW ON CHAIN 1
INFO:pystan:NOW ON CHAIN 2
INFO:pystan:NOW ON CHAIN 3

Here are the results.


In [6]:
#w=fit.summary(pars=['alpha','beta','gamm'])
#np.save('logregSummary.fit',w)
#w=np.load('logregSummary.fit.npy')
#w=w.tolist()
a=np.load('m1alpha.npy')
b=np.load('m1beta.npy')
g=np.load('m1gamm.npy')[:,:,0]
d=np.load('m1gamm.npy')[:,:,1]
#D[D==2]=np.nan
for i in range(3):
    x=np.linspace(1,7,101)
    y=1/(1+exp(-np.median(a[:,i])-np.median(b[:,i])*x))*np.median(1-g[:,i]-d[:,i])+np.median(g[:,i])
    plt.plot(x,y,':',color=['b','r','g'][i])
    plt.plot(range(1,8),R[i,:],color=['b','r','g'][i])
plt.legend(['Data Angry','Model Angry','Data Neutral',
    'Model Neutral','Data Smile','Model Smile'],loc=4);
#for j in range(lp.size): print '%.3f [%.3f, %.3f]'%(prs[j],lp[j],up[j])


Out[6]:
<matplotlib.legend.Legend at 0x4804110>

The model fits quite well. We now look at the estimated values for different face conditions.


In [7]:
D=np.concatenate([a,b,g,1-d,1-g-d],1)
print D.shape
for n in range(D.shape[1]):
    plt.subplot(2,3,[1,2,4,5,6][n/3])
    k=n%3
    plt.plot([k,k],[sap(D[:,n],2.5),sap(D[:,n],97.5)],color=clr)
    plt.plot([k,k],[sap(D[:,n],25),sap(D[:,n],75)],color=clr,lw=3,solid_capstyle='round')
    plt.plot([k],[np.median(D[:,n])],mfc=clr,mec=clr,ms=8,marker='_',mew=2)
    plt.xlim([-0.5,2.5])
    plt.grid(b=False,axis='x')
    plt.title(['alpha','beta','gamma','delta','1-gamma-delta'][n/3])
    plt.gca().set_xticks([0,1,2])
    plt.gca().set_xticklabels(['angry','neutral','smile'])


(3200, 15)

The estimates show what we already more-or-less inferred from the graph. The 95% interval for $\alpha$ and $\beta$ coefficients are overlapping and we should consider model with the same horizontal shift and steepness for each of the face conditions. We see that the $\gamma$ and $\delta$ vary between the conditions. To better understand what is happening consider the width of the acceptance band in each condition given by $1-\gamma-\delta$ shown in the right bottom panel. From the figure it looks like all three curves occupy a band of the same width. The estimation confirms this for the case of neutral and smile condition whose estimates overlap almost perfectly. In the angry condition it is not clear where the bottom floor of the logit curve is located. The curve is still linear for lower offers. This means that a) the $1-\gamma-\delta$ estimate is larger in angry condition and b) the estimate is more uncertain. We can reasonably argue that $1-\gamma-\delta$ should be equal across conditions and that discrepant estimate for angry condition is due to error or some strange money-face interaction which we are not interested in. We end up with the following model.

$$\mathrm{coop}_{i,j} \sim \mathrm{Bern}(\pi_{i,j})$$$$\pi_{i,j} =\mathrm{logit}^{-1}(\alpha+\beta\cdot\mathrm{fair}[i,j])\cdot \nu+\gamma_{\mathrm{face}[i,j]}$$

In [18]:
import pystan

model = """
data {
    int<lower=0> N;
    int<lower=0,upper=1> coop[N]; // acceptance
    int<lower=0,upper=8> fair[N]; // fairness
    int<lower=1,upper=3> face[N]; // face expression
}
parameters {
    real<lower=-20,upper=20> alpha;
    real<lower=0,upper=10> beta;
    real<lower=0,upper=1> gamm[3];
    real<lower=0,upper=1> delt[3];
    

}
transformed parameters{
    vector[N] x;
    vector[3] gamma[3];
    for (i in 1:3){
        gamma[i][1]<-gamm[i];
        gamma[i][2]<-delt[i];
        gamma[i][3]<-1-gamm[i]-delt[i];
    }
    for (i in 1:N)
        x[i]<-inv_logit(alpha+beta*fair[i])
            *gamma[face[i]][3]+gamma[face[i]][1];     
}
model {
    coop ~ bernoulli(x);
}
"""
sm = pystan.StanModel(model_code=model)


INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_136eb1c03490516c519e5c7e31e39205 NOW.

In [ ]:
dat = {'N': coop.size,'coop':np.int32(coop),'fair':fair,'face':face+1,'sid':sid}
seed=np.random.randint(2**16)
fit=sm.sampling(data=dat,iter=5000,chains=4,thin=5,warmup=2000,n_jobs=4,seed=seed)
outpars=['alpha','beta','gamm','delt']
print pystan.misc._print_stanfit(fit,pars=outpars,digits_summary=2)
w= fit.extract()
for op in outpars: np.save(op,w[op])
del w
del fit

In [11]:
a=np.load('alpha.npy')
b=np.load('beta.npy')
g=np.load('gamm.npy')
d=np.load('delt.npy')
#D[D==2]=np.nan
for i in range(3):
    x=np.linspace(1,7,101)
    y=1/(1+exp(-np.median(a)-np.median(b)*x))*np.median(1-g[:,i]-d[:,i])+np.median(g[:,i])
    plt.plot(x,y,':',color=['b','r','g'][i])
    plt.plot(range(1,8),R[i,:],color=['b','r','g'][i])
plt.legend(['Data Angry','Model Angry','Data Neutral',
    'Model Neutral','Data Smile','Model Smile'],loc=4);



In [14]:
from scipy.stats import scoreatpercentile as sap

print g.T.shape, np.atleast_2d(d).shape
D=np.concatenate([np.atleast_2d(a),np.atleast_2d(b),np.atleast_2d(b),g.T,1-d.T,1-g.T-d.T],0).T
print D.shape
for n in range(D.shape[1]):
    plt.subplot(2,3,[1,2,4,5,6][n/3])
    k=n%3
    plt.plot([k,k],[sap(D[:,n],2.5),sap(D[:,n],97.5)],color=clr)
    plt.plot([k,k],[sap(D[:,n],25),sap(D[:,n],75)],color=clr,lw=3,solid_capstyle='round')
    plt.plot([k],[np.median(D[:,n])],mfc=clr,mec=clr,ms=8,marker='_',mew=2)
    plt.xlim([-0.5,2.5])
    plt.grid(b=False,axis='x')
    plt.title(['alpha-beta','gamma','delta','1-gamma-delta'][n/3])
    plt.gca().set_xticks([0,1,2])
    plt.gca().set_xticklabels(['angry','neutral','smile'])


(3, 2400) (2400, 3)
(2400, 12)

Furthermore, we are concerned about the fact the the comparison across conditions is done within-subject and that the observed values are not independent. We extend the model by fitting separate logistic model to each subject. In particular, we estimate a separate $\gamma$ parameter for each subject i.e. $\gamma_{i,\mathrm{face}[i,j]}$. We use hierarchical prior that pools the estimates across subjects and also takes care of the correlation between conditions.

$$ \begin{bmatrix} \gamma_{i,s} \\ \gamma_{i,n} \\ \gamma_{i,a} \end{bmatrix} \sim \mathcal{N} \Bigg( \begin{bmatrix} \mu_s \\ \mu_n \\ \mu_a \end{bmatrix} ,\Sigma \Bigg)$$

where $$ \Sigma= \begin{pmatrix} \sigma_s^2 & \sigma_s r_{sn} \sigma_n & \sigma_s r_{sa} \sigma_a \\ \sigma_s r_{sn} \sigma_n & \sigma_n^2 & \sigma_n r_{na} \sigma_a \\ \sigma_s r_{sa} \sigma_a & \sigma_n r_{na} \sigma_a & \sigma_a^2 \\ \end{pmatrix} $$ For each condition we are estimating population mean $\mu$ and population variance $\sigma^2$. Furthermore, we estimate correlation $r$ for each pair of conditions. As a consequence the estimate $\mu$ are not confounded by the correlation.


In [5]:
import pystan

model = """
data {
    int<lower=0> N;
    int<lower=0> M; // number of subjects
    int sid[N]; // subject identifier
    int<lower=0,upper=1> coop[N]; // acceptance
    int<lower=0,upper=8> fair[N]; // fairness
    int<lower=1,upper=3> face[N]; // face expression
    
}
parameters {
    real<lower=-20,upper=20> alpha;
    real<lower=0,upper=10> beta;
    vector<lower=0,upper=1>[3] gamm[M];
    real<lower=0,upper=1> delt;
    vector<lower=0,upper=1>[3] mu;
    vector<lower=0,upper=1>[3] sigma;
    vector<lower=-1,upper=1>[3] r;
    

}
transformed parameters{
    vector[N] x;
    vector[3] gammt[3,M];
    matrix[3,3] S;
    for (i in 1:3) S[i,i]<-square(sigma[i]);
    S[1,2]<- sigma[1]*r[1]*sigma[2];S[2,1]<-S[1,2];
    S[1,3]<- sigma[1]*r[2]*sigma[3];S[3,1]<-S[1,3];
    S[2,3]<- sigma[3]*r[3]*sigma[2];S[3,2]<-S[2,3];
    for (m in 1:M){
        for (i in 1:3){
            gammt[i][m][1]<-gamm[m][i];
            gammt[i][m][3]<-delt;
            gammt[i][m][2]<- 1- gammt[i][m][1]-gammt[i][m][3];
    }}
    for (i in 1:N)
        x[i]<-inv_logit(alpha+beta*fair[i])
            *gammt[face[i]][sid[i]][3]+gammt[face[i]][sid[i]][1];     
}
model {
    for (i in 1:M) gamm[i]~multi_normal(mu,S);
    coop ~ bernoulli(x);
}
"""
sm = pystan.StanModel(model_code=model)


INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_3ae78a9281ca6e69bcf1ed2148a57829 NOW.

In [ ]:
dat = {'N': coop.size,'coop':np.int32(coop),'fair':fair,'face':face+1,'sid':sid+1,'M':1326}
seed=np.random.randint(2**16)
fit=sm.sampling(data=dat,iter=5000,chains=4,thin=5,warmup=2000,n_jobs=4,seed=seed)
outpars=['alpha','beta','delt','mu','sigma','r']
print pystan.misc._print_stanfit(fit,pars=outpars,digits_summary=2)
w= fit.extract()
for op in outpars: np.save(op,w[op])
del w
del fit

Once we have found the appropriate model we can look at the contrasts of interest. In our case we are interested in $\mu_\mathrm{smile}-\mu_\mathrm{neutral}$ and $\mu_\mathrm{angry}-\mu_\mathrm{neutral}$. The former is TODO while the latter is TODO. This is the effect size that we should be interested in. Note how the estimate goes beyond simple contrast that just computes the mean difference between the angry and neutral condition. The model is a device that allows us to extract the quantity from the data. Our model takes care of missing values, of imballanced groups (due to missing values). It accounts for the ceiling and floor effects. In Anova that assumes linear trends these showed up as a significant correlation. We saw no such interaction. The model also took care of the correlation of subject's performance in different groups. On the other hand it seems rather redundant to estimate the $\alpha$ and $\beta$ parameters. In this these did not differ noticably between the conditions but in other context they may provide interesting insight. Curiously even in our context we can find use for them. $\beta$ expresses the increase in acceptance rate for each unit of money offered. We are mostly interested in the increase in the range between 2c and 6c. Here the curve is approximately linear with slope $\beta/4$. We can use this fact to ask a following question. The paper Mussel et al. bears the title "What is the values of a smile". The implication of the title is that smile has a similar influence on the acceptance rate as a sum of money does. We can reformulate this question in the form of a following counterfactual. What is the sum of money we would need to offer a subject who saw neutral face so that his acceptance rate reaches a level it would have if he saw a smiling face. This quantity is given by $4(\mu_\mathrm{smile}-\mu_\mathrm{neutral})/\beta$. The value of the smile is . This result is valid for offers in the range where the logistic function is approximately linear (i.e. 2c and 6c). This is the middle range of the tested values and obviously the range in which the authors were interested in. If we assume that people behave similarly whether the total sum is 14c, 14 EUR, 14 or in fact any sum then we can say that the value of a smile as \% of the equal share. This quantity is informative. Compare it to the $\eta$. It doesn't depend on the number of conditions. For instance the estimate of value of a smile is independent on the fact that we included an angry condition in the experiment. The quantity is directly expressed in units we well understand (compare to squared unitless quantity). Finally, the quantity has causal intepretation. This is an important fact to which I will return in my latter posts.


In [ ]: